Visualising Data

Leighton Pritchard

6 September 2016

IMAGINE…

  • YOUR FIGURES ARE AMAZING!
  • BUT MISLEADING

A bar chart

Two effectors

  • Knocked out independently
  • Host chlorosis measured

Communication

  • Stories told through figures

Scales matter

  • Indication of quantities

Context matters

  • The same or not the same?

Figures can mislead

Storytelling

  • Figures are what you remember of a ‘story’

  • What about uncertainty?

Another Bar Chart

Four effectors

  • Bacterial effectors
  • Inoculate wild-type plants
  • Measure growth (CFU)

Four bar plots

  • Do the effectors have the same effect?

Add error bars

  • Do the effectors have the same effect?

Error bars

Error bars

  • Estimates of uncertainty
  • But uncertainty of what?
  • standard deviation (sd):
    • describes the data: how much members of the group differ from the mean
  • standard error (of the mean) (sem):
    • describes the estimate of the mean: standard deviation of the estimate of the mean

SD or SEM?

  • Which was used (& which do you need to know)?

Raw data

Raw data

  • Are they the same biological responses?

What does mean mean?

  • Same mean implies the same response?

What does mean mean?

  • Unequal sample sizes (cf. barplot)

What does mean mean?

  • Outliers (cf. barplot)

What does mean mean?

  • Bimodal distribution (cf. barplot)

But stats, right?

We only use figures as guides…

  • “Figures tell a story, but we actually only believe the stats”
  • Typical paper:
    • P<0.05, t-test (NHST), a description if you’re lucky
  • Do the distributions support use of a t-test, e.g. assumptions for 2-sample t-test:
    • both populations Normal
    • equal standard deviations

…we trust the P-values

  • Bar plots can hide inappropriate assumptions

Source: Weissgerber et al. (2015)

Figures can mislead

  • reinforce poor practice
    • binary thinking
    • overlooking data distributions and wrong statistical assumptions for tests
    • overlooking uncertainty
  • suggest neat stories (P<0.05)
    • data, like life, can be messy

Ways forward

Your analysis?

  • What you did…
    • Open package foo. Click. Click, drag. Click, Click. Undo. Click. Right-click. Save results.csv
    • Load into Excel. Click, drag. Generate graph. Right-click. Save pretty-graph.png

Your analysis?

  • What you said you did in the paper…
    • I analysed my data in foo using the bar analysis. Results are shown in Figure 1.

How reproducible is a mouse click?

Reproducible research

  • Automate (i.e. learn to program)
  • Write code in a (very) high-level language
  • Get some training
  • Use version control
  • Get a code buddy
  • Share code and data openly
  • Write tests

Now what?

Other visualisations

Anscombe’s Quartet

  • Four datasets: same means and standard deviations

Boxplots

  • Median, interquartiles, outliers

Raw data

  • 1D scatterplots

Box and raw data

  • Boxplots and jittered 1D scatterplots

Violin plot

  • Data density estimate

Violin and raw data

  • Stacked, not jittered, data

Acknowledgements

Where do ideas come from?

References

Anscombe, F. J. (1973). “Graphs in Statistical Analysis.” American Statistician 27 (1): 17–21. pp. 17-21. Paper

Weissgerber, T. L. et al. (2015). “Beyond bar and line graphs: Time for a new data presentation paradigm.” PLoS Biology, 13(4), e1002128. doi:10.1371/journal.pbio.1002128 Paper